Newest 'scikit-learn nlp' Questions

0votes

0answers

39views

Keep training pytorch model on new data

I'm working on a text classification task and have decided to use a PyTorch model for this purpose. The process mainly involves the following steps: Load and process the text. Use a TF-IDF Vectorizer....

Simon

101

asked Aug 30, 2024 at 23:10

1vote

1answer

907views

LLAMA MODEL WITHOUT USING HUGGINGFACE API

Is it possible to obtain the llama model alone as open source code without using the Huggingface API so that it can be hosted on our server?

Anagha M P

11

asked Jun 1, 2023 at 12:42

1vote

1answer

389views

Text segmentation problem

I am new to ML and trying to solve problem of text segmentation. I have a transcript of news show and I want to split this transcript into parts by topic. I tried to google and asked chatgpt and found ...

Oleg Bovykin

13

asked May 30, 2023 at 16:17

0votes

0answers

129views

On which texts should TfidfVectorizer be fitted when using TF-IDF cosine for text similarity?

I wonder on which texts should TfidfVectorizer be fitted when using TF-IDF cosine for text similarity. Should TfidfVectorizer be fitted on the texts that are analyzed for text similarity, or some ...

Franck Dernoncourt

5,850

asked Apr 14, 2023 at 2:46

2votes

2answers

130views

In sklearn tfidf what is the difference between term frequecy and document frequency

Looking at the sklearn tfidf page: https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html and trying to understand the difference between term frequency ...

james pow

167

asked Nov 29, 2022 at 17:53

3votes

4answers

2kviews

Accuracy is getting worse after text pre processing

I'm working a multi-class text classification project. After splitting the dataset into train and test datasets, I've applied the below function on the train dataset (AKA pre processing): ...

Ben

209

asked Oct 6, 2022 at 11:50

3votes

1answer

574views

Is there a way to map words to their synonyms in tfidf?

I have the following code: ...

james pow

167

asked Sep 28, 2022 at 18:54

1vote

1answer

197views

Why is max_features ordered by term frequency instead of inverse document frequency

In the docs: https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html it is explained that max_features is ordered by ...

james pow

167

asked Sep 28, 2022 at 18:06

0votes

1answer

233views

LinearSVC training time with CountVectorizer and HashingVectorizer

I am currently trying to build a text classifier and I am experimenting with different settings. Specifically, I am extracting my features with a CountVectorizer ...

ryuzakinho

145

asked Jul 19, 2022 at 10:52

0votes

1answer

59views

Optimal clusters for K-means not clear - any ideas?

I have a toy dataset of 10,000 strings of people's names, addresses and birthdays. As a quirk of the data collection process it is highly likely there are duplicate people caused by typos and I am ...

Sandy Lee

267

asked May 4, 2022 at 11:04

1vote

0answers

18views

What can be the approaches to merge (ensemble) a NON-Probabilistic model with RandomForest?

I have a RF for Text classification and it gives me accuracy. Almost same metric is given by another model built using ...

Deshwal

323

asked Jan 18, 2022 at 14:17

4votes

1answer

1kviews

How to perform entity level train-val-test split for NER task?

A normal and stratified split option is provided by sklearn method that can be used for ML problems like multi-class classification. This is relatively easier to do as (1) one sample has one class, ...

Mohit

141

asked Dec 5, 2021 at 8:08

0votes

3answers

156views

Creating numeric word representation of input sentences resulting in MemoryError

I am trying to use CountVectorizer to obtain word numerical word representation of data which is essentialy list of 160000 English sentences: ...

Mahesha999

299

asked Nov 25, 2021 at 19:43

1vote

1answer

74views

Which algorithm is best for predicting diseases if symptoms are given? [closed]

After Topic modelling through LDA, I get the following dataset as result. ...

Atom Store

103

asked Oct 29, 2021 at 9:19

3votes

1answer

1kviews

How to identify/recognize that a sentence about talks about future?

Brief Introduction: I have a report/paragraph in which there are sentences with reference to future plans/outlooks/expectations for a particular entity. I want to extract all such sentences for now. ...

Krs

31

asked Oct 13, 2021 at 14:56

Stack Exchange Network

All Questions

Keep training pytorch model on new data

LLAMA MODEL WITHOUT USING HUGGINGFACE API

Text segmentation problem

On which texts should TfidfVectorizer be fitted when using TF-IDF cosine for text similarity?

In sklearn tfidf what is the difference between term frequecy and document frequency

Accuracy is getting worse after text pre processing

Is there a way to map words to their synonyms in tfidf?

Why is max_features ordered by term frequency instead of inverse document frequency

LinearSVC training time with CountVectorizer and HashingVectorizer

Optimal clusters for K-means not clear - any ideas?

What can be the approaches to merge (ensemble) a NON-Probabilistic model with RandomForest?

How to perform entity level train-val-test split for NER task?

Creating numeric word representation of input sentences resulting in MemoryError

Which algorithm is best for predicting diseases if symptoms are given? [closed]

How to identify/recognize that a sentence about talks about future?

Hot Network Questions

All Questions

Related Tags